Finding Speaker Face Region by Audiovisual Correlation

نویسندگان

  • Yuyu Liu
  • Yoichi Sato
چکیده

The ability to find the speaker face region in a video is important in various application areas. In this work, we develop a novel technique to find this region robustly against different views and complex backgrounds using gray images only. The main thrust of this technique is to integrate audiovisual correlation analysis into an image segmentation framework to extract the speaker face region. We first analyze the video in a time window and evaluate the audiovisual correlation locally at each pixel position using a novel statistical measure based on Quadratic Mutual Information. As only local visual information is adopted in this stage, the analysis is robust against the view change of the human face. Analyzed correlation is then incorporated into Graph Cut-based image segmentation, which optimizes an energy function defined over multiple video frames. As this process can find the global optimum segmentation with image information balanced, we thus can extract a reliable region aligned to real visual boundaries. Experimental results demonstrate the effectiveness and robustness of our method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simulation of talking faces in the human brain improves auditory speech recognition.

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain op...

متن کامل

Audiovisual-based adaptive speaker identification

An adaptive speaker identification system is presented in this paper, which aims to recognize speakers in feature films by exploiting both audio and visual cues. Specifically, the audio source is first analyzed to identify speakers using a likelihood-based approach. Meanwhile, the visual source is parsed to recognize talking faces using face detection/recognition and mouth tracking techniques. ...

متن کامل

Multifactor Fusion for Audio-Visual Speaker Recognition

In this paper we propose a multifactor hybrid fusion approach for enhancing security in audio-visual speaker verification. Speaker verification experiments conducted on two audiovisual databases, VidTIMIT and UCBN, show that multifactor hybrid fusion involve a combination feature-level fusion of lip-voice features and face-lip-voice features at score-level is indeed a powerful technique for spe...

متن کامل

Speaking faces for face-voice spe

In this paper, we describe an approach for an animated speaking face synthesis and its application in modeling impostor/replay attack scenarios for face-voice based speaker verification systems. The speaking face reported here learns the spatiotemporal relationship between speech acoustics and MPEG4 compliant facial animation points. The influence of articulatory, perceptual, and prosodic acous...

متن کامل

Adaptive speaker identification with audiovisual cues for movie content analysis

An adaptive speaker identification system which employs both audio and visual cues is proposed in this work for movie content analysis. Specifically, a likelihood-based approach is first applied for speaker identification using pure speech data, and techniques such as face detection/recognition and mouth tracking are applied for talking face recognition using pure visual data. These two informa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008